Coarse Grained FPGA Overlay for Rapid Just-In-Time Accelerator Compilation
نویسندگان
چکیده
Coarse-grained FPGA overlays built around the runtime programmable DSP blocks in modern FPGAs can achieve high throughput and improved scalability compared to traditional without detailed consideration of architecture. These be mapped using higher level compilers, achieving fast compilation, software-like programmability run-time management, high-level design abstraction. OpenCL allows programs running on a host computer launch accelerator kernels which compiled at for specific architecture, thus enabling portability. However, prohibitive hardware compilation times flows mean that tools cannot effectively use just-in-time (JIT) or performance scaling FPGAs. We present methodology dataflow graphs expressed as onto coarse-grained overlays. The benefits from abstraction afforded by programming model, while mapping overlay significantly reduces load times. Key characteristics this work include highly performant DSP-optimized functional units scale large devices ability perform automatic resource-aware kernel replication up size overlay. demonstrate place route orders magnitude better than HLS flows, even when an embedded processor Xilinx Zynq.
منابع مشابه
Resource-Aware Just-in-Time OpenCL Compiler for Coarse-Grained FPGA Overlays
FPGA vendors have recently started focusing on OpenCL for FPGAs because of its ability to leverage the parallelism inherent to heterogeneous computing platforms. OpenCL allows programs running on a host computer to launch accelerator kernels which can be compiled at run-time for a specific architecture, thus enabling portability. However, the prohibitive compilation times (specifically the FPGA...
متن کاملQUKU: A Coarse Grained Paradigm for FPGA
To fill the gap between increasing demand for reconfigurability and performance efficiency, coarse grain reconfigurable architectures are seen to be an emerging platform. The advantage lies in quick dynamic reconfiguration and power efficiency. Despite having these advantages they have failed to show their mark. This paper describes the QUKU architecture, which uses a coarsegrained dynamically ...
متن کاملJust-in-time Compilation for Generalized Parsing
Parsing syntactically extensible languages requires generalized parsers which are slow to generate for repeatedly changing grammars. This situation is similar to the execution of dynamic languages like JavaScript, suggesting that we can appropriate technology from that field to use in just-in-time compiled parsers. We implement two just-intime compiling grammar interpreters, a simple one and a ...
متن کاملA Coarse-Grain FPGA Overlay for Executing Data Flow Graphs
We explore the feasibility of using a coarse-grain overlay to transparently and dynamically accelerate the execution of hot segments of code that run on soft processors. The overlay, referred to as the Virtual Dynamically Reconfigurable (VDR), is tuned to realize data flow graphs in which nodes are machine instructions and the edges are inter-instruction dependences. A VDR consists of an array ...
متن کاملOutput Serialization for FPGA-based and Coarse-grained Processor Arrays
This paper deals with the mapping of loop programs onto processor arrays either implemented in an FPGA or available as (reconfigurable) coarse-grained processor architectures. Usually the proportion of processing elements to I/O-interfaces is much higher whereby problems of data transportation and synchronization are arising. In this realm, we propose a systematic approach in order to feed-out ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Transactions on Parallel and Distributed Systems
سال: 2022
ISSN: ['1045-9219', '1558-2183', '2161-9883']
DOI: https://doi.org/10.1109/tpds.2021.3116859